Model Selection and the Principleof Minimum Description

نویسندگان

  • Mark H. Hansen
  • Bin Yu
چکیده

This paper reviews the principle of Minimum Description Length (MDL) for problems of model selection. By viewing statistical modeling as a means of generating descriptions of observed data, the MDL framework discriminates between competing models based on the complexity of each description. This approach began with Kolmogorov's theory of algorithmic complexity, matured in the literature on information theory, and has recently received renewed interest within the statistics community. In the pages that follow, we review both the practical as well as the theoretical aspects of MDL as a tool for model selection, emphasizing the rich connections between information theory and statistics. At the boundary between these two disciplines, we nd many interesting interpretations of popular frequentist and Bayesian procedures. As we will see, MDL provides an objective umbrella under which rather disparate approaches to statistical modeling can co-exist and be compared. We illustrate the MDL principle by considering problems in regression, nonparametric curve estimation , cluster analysis, and time series analysis. Because model selection in linear regression is an extremely common problem that arises in many applications, we present detailed derivations of several MDL criteria in this context and discuss their properties through a number of examples. Our emphasis throughout this paper is on the practical application of MDL, and hence we make extensive use of real data sets. In writing this review, we tried to make the descriptive philosophy of MDL natural to a statistics audience by examining classical problems in model selection. In the engineering literature, however, MDL is being applied to ever more exotic modeling situations. As a principle for statistical modeling in general, one strength of MDL is that it can be intuitively extended to provide useful tools for new problems.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The selection of the best from climate change model in the estimation of climatology variables for east region of the country by use fifth report data

Climate change is nowadays a major cause of concern in water related fields because it may cause more severe, shortened or prolonged droughts or floods in the future. In this research was tried to the best model of climate change is determined from the climate change models to determining the minimum temperature, maximum temperature and precipitation for the Birjand synoptic station. For this r...

متن کامل

Introduction to Minimum Encoding Inference

This paper examines the minimumencoding approaches to inference, Minimum Message Length (MML) and Minimum Description Length (MDL). This paper was written with the objective of providing an introduction to this area for statisticians. We describe coding techniques for data, and examine how these techniques can be applied to perform inference and model selection.

متن کامل

Model selection using wavelet decomposition and applications

In this paper we discuss how to use wavelet decomposition to select a regression model. The methodology relies on a minimum description length criterion which is used to determine the number of non-zero coefficients in the vector of wavelet coefficients. Consistency properties of the selection rule are established and simulation studies reveal information on the distribution of the minimum desc...

متن کامل

Computing Minimum Description Length for Robust Linear Regression Model Selection

A minimum description length (MDL) and stochastic complexity approach for model selection in robust linear regression is studied in this paper. Computational aspects and implementation of this approach to practical problems are the focuses of the study. Particularly, we provide both algorithms and a package of S language programs for computing the stochastic complexity and proceeding with the a...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1998